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ABSTRACT: 

A system and methods for processing a stream of video image data so as to create a video 
ZoSnS'n T«P'fl^d/a corresponding to resolution or bitstrear^ IcaS Th." 
?n t?!„ H '"''^ °^ ^^^'^ MacroBlock (MB) structure of the MPEG-1 

SO standard IS preserved across all resolution and bitstream scales e g by scaSg across four 

LTnunt of h°" w^" ^ ^ ^^"^ °^ ^tt^'butes vvhich ZSe to Sie 

amount o^overhead data rncorporated in an MPEG-1 compressed data stream so hat by 
preserving the MB identity across multiple resolutions and bitstream scales these sS can 
MR -H ^'^'i^^f ^"^^^d- iJ"^ ^^q^'nng to be included only once in the dSa siream pJ^rN^ng the 
MB Identity also simplifies significantiy the derivation of motion estimation vectordataTrT 

S^i^nonr'? ^'3^^* motion veSor data 
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© A motion video compre^ion system with multiresolution features. 

@ A system and methods for processing a stream of video Image data so as to create a video representation 
that muHiplexes data corresponding to resolution or bitstream scales. This representation is such that the identity 
ofthe basic MacroBlock (MB) structure of the MPEG-1 ISO standard is preserved across all resolution and 
bitetream scales. e.g.. by scaling awoss four levels of resolution. An MB is associated with a series of attributes 
which contnbute to the amount of overhead data incorporated in an MPEG-1 compressed data stream so that 
by preserving the MB identity across multiple resolutions and bitstream scales, these scales can share this 
overhead, thus requiring it to be included only once in the data stream. Preserving the MB identity also 
simplifies significantly the derivation of motion estimation vector data for all resolution scales other than the 
highest resolution. Essentially the motion vector data corresponding to any resolution scale can be derived from 
the highest resoluton motion vector data by appropriately scaling it down. 
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BACKGROUND OF THE INVENTION 
FIELD OF THE INVENTION 



6 The present invention relates to the field of data compression and. more particularly, to a system and 
techniques for compressing and decompressing digital motion video signals at a multiplicity of scales The 
techniques expand on algorithms similar to the emerging MPEG standard proposed by the International 
StandanJs Organization's Moving Picture Experts Group (MPEG). 
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Technological advances in digital transmission networks, digital storage media. Very Large Scale 
Integration devices, and digital processing of video and audio signals are converging to make the 
transmission and storage of digital vkJeo. economical in a wide variety of applications. Because the storage 
and transmission of digital video signals is central to many applications, and because an uncompressed 
representation of a video signal requires a large amount of storage, the use of digital video compressipn 
techniques is vital to this advancing art. In this regard, several international standards for the compression of 
digital video signals have emerged over the past decade, with more currently under development. These 
standards apply to algorithms for the transmission and storage of compressed digital video in a variety of 
applications, including: video -telephony and teleconferencing: high quality digital television transmission on 
coaxial and fiber-optic networks as well as broadcast terrestrially and over direct broadcast satellites- and 
in interactive multimedia products on CD-ROM, Digital Audio Tape, and Winchester disk drives. 

Several of these standards involve algorithms based on a common core of compression technkjues 
u^A^^^l 2"^ (Consultative Committee on International Telegraphy and Telephony) Recommendatfon 
H.120, the CCITT Recommendation H.261. and the ISO/IEC MPEG standard. The MPEG algorithm has 
been devetoped by the Moving Picture Experts Group (MPEG), part of a joint technical committee of the 
International Standards Organization (ISO) and the International Electrotechnical Commission (lEC) The 
MPEG committee fias been devetoping a draft standard for the multiplexed, compressed representation of 
video and associated audio signals. The standard specifies the syntax of the compressed bit stream and a 
method of decoding a digital video signal at one level of. spatial resolution. This draft standard will be 
referrred to as the MPEG-1 standard or algorithm, inorde^ .to distinguish it from newer algorithms now 
under discussion by the same committee. The MPEG-1 draft standard is described in document ISO/IEC 
JTC1/SC2 WG1 1 MPEG 91/090 of May 1991. 

As the present invention may be applied to extend the functions of an MPEG - 1 decoder to produce a 
mulhplicity of video resolutions from the same compressed bH stream, some pertinent aspects of the 
MPEG-1 video compression algorithm will be reviewed. It is to be noted, however, that the invention can 
also be applied to other video coding algorithms which share some of the features of the MPEG algorithm. 

The MPEG - 1 Video Compression Algorithm 



To begin with, it will be understood that the compressfon of any data object, such as a page of text an 
image, a segment of speech, or a video sequence, can be thought of as a series of steps, including: 1) a 
decomposition of that object into a collection of tokens; 2) the representation of those tokens by binary 
strings which have minimal length in some sense; and 3) the concatenation of the strings in a well - defined 

45 order. Steps 2 and 3 are lossless, i.e.. the original data is faithfully recoverable upon reversal, and Step 2 is 
known as entropy coding. (See, e.g., T. BERGER. Rate Distortion Theory. Englewood Cliffs NJ- 
Prentice -Hall. 1977; R. McELIECE. The Theory of Information and Coding, Reading. MA: Addi^- 
Wesley. 1971; D. A. HUFFMAN. "A Method for the Construction of Minimum Redundancy Codes " Proc 
IRE. pp. 1098-1101. September 1952; G. Q. LANGDON. "An Introduction to Arithmetic Coding,"' IBM J 

60 Res. Develop., vol. 28. pp. 135-149. March 1984) Step 1 can be either lossless or lossy in general. Most 
video compression algorithms are tossy. A successful lossy compresston algorithm eliminates redundant 
and irrelevant information, allowing relatively large errors where they are not likely to be visually significant 
and carefully representing aspects of a sequence to which the human observer is very sensitive The 
techniques emptoyed in the MPEG-1 algorithm for Step 1 can be described as predictive/interpolative 

55 motion -compensated hybrid DCT/DPCM coding. Huffman coding, also known as variable length coding 
(see the above cited Huffman 1952 paper) is used in Step 2. Although, as mentioned the MPEG-1 
Standard is really a specification of the decoder and the compressed bit stream syntax, the following 
descnption of the MPEG-1 spedffcatfon Is. fbr ease of presentation, primarily from an encoder point of 
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J 1"^ "Iir ^'^^'"^^ ^ representation of video for digital storage media, as set 

forth m ISO-IEC JTC1/SC2WG11 MPEG CD-11172, MPEG Committee Draft. 1991. "She algorithm is 
designed to operate on non-interlaced component video, although it can be extended to operate with 
5 interlace video by combining two consecutive interlaced fields into a single picture. Each picture has three 
components: luminance (Y). red color difference (Q. and blue color difference (Q. The C, and C. 
wmpon^ts each have half as many samples as the Y component in both horizontal and vertical directions, 
hurther. the algonthm operates with a single level of video resolution. 

10 Layered Stmcture of an MPEG - 1 Sequence 

An MPEG - 1 data stream consists of a video stream and an audio stream which are packed, together 
with systems mfomiatlon and possibly other bitstreams. into a systems data stream that can be regarded as 
layered. Within the video layer of the MPEG-1 data stream, the compressed data is further layered. The 
fs highest layer is the Video Sequence Layer, containing control information and parameters for the entire 
sequence. A descnption of the organization of the other layers will aid in understanding the invention. These 
layers of the MPEG - 1 Video Uyered Structure, are shown in Rgures 1 -4. Specifically the ligures show 
Rgurel: Groups of Pictures (GOPs). * ■» • 

Rgure 2: Macroblock (MB) subdivision of a picture. 
20 Rgure 3: Slice subdivision of a picture (example). 
Rgure 4: Block subdivision of a macroblock. 

The layers pertain to the operation of the compression algorithm as well as the composition of a 
compressed bit stream. As noted, the highest layer is the Video Sequence Uyer. containing control 
information and parameters for the entire sequence. At the next layer, a sequence is subdivided into sets of 
consecutive pictures, each known as a Group of Pictures (GOP). A general illustration of this layer is 
shown in Rgure 1. Decoding may begin at the start of any GOP. essentially independent of the preceding 
GOPs. There is no limit to the number of pictures which may be in a GOP. nor do there have to be equal 
numbers of pictures in all GOPs. 

Tlie third or Picture layer is a single picture. A general illustration of this layer is shown in Rgure 2 
30 Decoding may begin at the start of any GOP; essentially the luminance component of each piiiire is 
subdivided into 16 x 16 regions and the color difference components are subdivided into 8 x 8 regions 
spatiaHy co-sited with the 16 x 16 luminance regions. Taken together, the co-sited luminance region and 
color difference regions make up the fifth layer, known as a macroblock (MB) " 

Between the Picture and MB layere Is the fourth or slice layer. Each slice consists of an arbitrary or 
optional number of consecutive MB's. SItees need not be unifomi in size within a picture or from picture to 
picture. They may be only a few macroblocks in size or extend across multiple rows of MWs as shown in 
Figure 3. 

An MB is a fundamental layer to which vartous attributes can be associated as will be seen below The 
basic structure of an MB consists of four luminance blocks and two chrominance blocks as seen in Rgure 
4. All of these blocks are of size 8 x 8 in MPEG-1. Preserving the structure and attributes of an MB (not 
necessarily its size) across a multiplicity of picture resolutions is one of the goals of this im^ention 

Within a GOP. three types of pictures can appear. The distinguishing difference among the picture 
types IS the compression method used. IrtUamode pictures or I -pictures are compressed independently of 
any other picture. Although there is no fixed upper bound on the distance between I -pictures it is 
45 expected that they will be interspersed frequently throughout a sequence to facilitate random access and 
other special modes of operation. Each GOP must start with an I -picture and additional I -pictures can 
appear within the GOP. The other types of pictures, predlctlvely motion-compensated pictures (P- 
pictures) and bidlrectlonally motion- compensated pictures (B- pictures), will be described in the dis- 
cussion on mofion compensation below. A general illustration is shown In Rgure 5. 

Motion Compensatk>n 

Most video sequences exhibit a high degree of correlation between consecutive pictures. A useful 
method to remove this redundancy prior to coding a picture is "motion compensation". Motion compensa- 
ton requires some means for modeCng and estimating the motion in a scene. In MPEG - 1 . each picture is 
partitioned into macroblocks and each MB is compared to 16 x 16 regions in the same general spatial 
location in a predicting picture or pictures. The region in the predicting picture(s) that best matches the MB 
in some sense is used as the prediction. The difference between the spatial location of the MB and that of 
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rf s predtdor is referred to as motion vector. Thus, the outputs of the motion estimation and compensation 
for an MB are motion vectors and a motion -compensated difference macrobiock. These can generally be 
compressed more than the original MB itself. Pictures which are predictively motion -compensated using a 
single predicting pichire in the past. i.e.. forward -in -time in the sequence, are known as P-pictures 

in MPEG -1. the time interval between a P-picture and Hs predicting picture can be greater than one 
picture interval. For pictures- that fail between P-pictures or between an I -picture and a P-picture 
backward-in-tme prediction may be used in addition to forward -in -time prediction. Such pictures are 
known as bidirectionally motion -compensated pictures (B-plctures). For B- pictures, in addition to 
forward and backward prediction, interpolative motion compensation is allowed in which the predictor is an 
average of a block from the previous predicting picture and a block from the future predicting picture In this 
case, two motion vectors are needed. . 

The use of bidirectional motion compensation leads to a two-level motion compensation structure as 
depicted in Figure 5. Each anrow indfcates the prediction of the picture touching the arrowhead using the 
picture touching the dot. Each P-picture is motion -compensated using the previous P-picture (or I- 
rs picture, as the case may be). Each B- picture is motion -compensated by the P- or I -pictures 
immediately before and after It These predfcting pictures are sometimes referred to as "anchor" pictures 
No limit IS spedfied in MPEG-1 on the distance between anchor pictures, nor on the distance between I- 
pictures. In fact, these parameters do not have to be constant over an entire sequence. Referring to the 
20 STe 5 hiTw) M9T' *° ^'"^""^ P-m^res as M. the sequence shown in 

It should therefore be understood Uiat an MPEG-1 sequence consists of a series of I -pictures which 
may have none or one or more P-pictures sandwiched between them. The various I- and P- pictures 
may have no 6 -pictures or one or more B- pictures sandwiched between fhem, in whfch latter event they 
are anchor pictures. ' 

26 

Transfonmation and quantization of an MB 

One very useful image compression technique Is transform coding. (See N. S. JAYANT and P NOLL' 
Digital Coding of Waveforms. Principles and Applications to Speech and Video, Engiewood Cliffs NJ- 

so Prentice-Hall 1984. and A. G. TESCHER. "Transfbm, Image Coding." in W. K. Pratt, editor. Image 
Transmission Techniques, pp. 113-155. New York. NY: Academic Press. 1979.) In MPEG-t and Several 
o^her compression standards, the discrete cosine transfom, (DCT) is the transfomi of choice. (See K. R 
RAO and P. YIP. Discrete Cosine Transform. Algorithms, Advantages, Applications, San Diego' CA- 
Academic Press. 1990..and N. AHMED, T. NATARAJAN, and K. R. RAO. "Discrete Cosine Trim, " 

35 IEEE Transactions on Computers, pp. 90-93. January 1974.) The compression of an I -picture, for 
example, is achieved by taking the DCT of the blocks of luminance and chrominance pixels within a MB 
quantizing the DCT coeffidents. and Huffman coding tiie result. Similar principles apply to the compressioti 
0 P- and B-pictures except that, in these cases, the DCT may be applied to the difference between the 
blocks of pixels within an MB and their corresponding motton- compensated -prediction 

40 The DCT converts a block of n x n pixels into an n x n set of transfomi coefficients. The DCT is very 
useful m compression applications, because it tends to concentrate the energy of the block of pixel data 
m o a few of the DCT coefficients, and further, the DCT coefffcients are nearly independent of each other 
Uke several of the intemational compression standards, the MPEG-1 algorithm uses a DCT block size of 8 
X 8 which con-esponds to the size of the btocks within an MB. It is one purpose of this invention to use 
resdutionf '"^ ^ ^''^ °^ ^ supporting pictures of multiple 

MppT!)^ rl"*^" OCT coefficients, which is the primary source of lossiness in the 

MPEG-1 algorithm. Denoting the elements of the two-dimensional array of DCT coefficients by c 
where m and n can range from 0 to 7. aside from truncation or rounding corrections, quamizationlte 

so achieved by dividing each DCT coefficient c„„ by w„„ x QP. with being a weighting factor and OP 
being the quantizer parameter. The weighting factor w„„ allows coarser quantization to be applied to the 
less visually significant coefficients. There can be two sets of these weights, one for l-pfetures and the 
other - for P- and B-plctures. Custom weights may be transmitted in the video sequence layer The 
quantizer parameter QP is the primary means of trading off quality vs. bit - rate in MPEG - 1 . It is Importart 

55 to note that OP can vary from MB to MB within a picture. It is also important to note that in this invention it 
IS possible to choose either to provide separate weight matrices for the DCTs of other sizes or to provide 
weight matnces of different sizes which are mathematically related so as to facilitate decoder processing 
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Following quantization, the OCT coefficient information for each IVIB is organized and coded, using a set 
of Huffman codes. The details of this step are not essential to an understanding of the invention so that no 
description will Ire given. tHJt for further information thereon reference may be had to the previously -cited 
HUFFMAN 1952 paper. 

Macroblock Attributes due to Motion Compensation 

It will be appreciated that there are three kinds of motion compensation which may be applied to MB's- 
forward, backward, and interpolative. The encoder must select one of these modes. For some MBs. none of 
the motion compensafion modes yields an accurate prediction. In such cases, the MB may be selected for 
intramode coding as with l-plctures. Thus, depending on the motion compensatton mode. MBs can be of 
the following types: 

• forward 

• backward 
rs • interpolative 

• intra 

Also in P - pictures, depending on the value of the motion vector. MBs can be either of the type with motion 
vector zero or of the non - zero type. These types together with the required motion vector data are coded 
with every MB as overhead data. The exceptions are skipped MBs. as will be explained below. 

Macroblock Attributes due to Transfonmation and Quantization 

As discussed previously, the QP parameter can be changed on an MB to MB basis. When this change 
takes place additional MB types are used to indicate that a new QP should be used. The new QP value 
86 itself is transmitted together with the MB. 

After applying the DCT and quantization to the blocks within an MB. it may result that some of the 
blocks contain only zeros. These btocks do not require further data to be coded and are signalled by a, so 
called, coded block pattern code. Thus code represents additional overhead. 

Rnally whenever MBs contain no additional new infomiation. they can also be skipped. To convey this 
30 information an MB address is also transmitted together with every non - skipped MB. 

It should be noted then that MBs can^ with them, a series of attributes that need to be described by 
including overhead data wHh each coded MB. It is one object of this invention to preserve the identity of 
MBs across a multiplicity of scales such that the overhead is included only once, except perhaps for the 
refinement of some parameters such as the accuracy of the motfon vectors. 

THE PROBLEM 

It should be understood, therefore, from the foregoing description of the MPEG-1 video algorithm that 
the purpose of the MPEG - 1 standard is to specify the syntax of a compressed bit stream correspondlJig to 
40 a video sequence, and to specify the methods used to decode the sequence at a single level of spatial 
resolution. The problem to which the present invention is addressed is extending the specification of the 
syntax and the decoding methods of MPEG-1 so that digital video sequences can be decoded and 
encoded at a multiplicity of scales. For present purposes, two styles of scaling are distinguished: 

1. Resolution ScaHng:.This refers to the ability of generating a bitstream that can be decoded at a 
4S multiplicity of spatial resolutions by selecting different portions of the bitstream. This feature is desirable 

in some applications where multiple video windows must be displayed in a full resolution screen. It is 
also desirable because it permits the implementation of decoders of varying degrees of complexity, such 
that very simple decoders may be possible that decode only the lower spatial resolutions. 

2. Bitstream Scaling: This refers to the ability of generating a bitstream in which some coded bits can be 
50 disregarded, and a usable image still results. A resolutfon scalable algorithm Is also a bitstream scalable 

algorithm; however, bitstream scaling is interpreted in a more narrow sense here. Brtstream scalability is 
intended to mean decompressing to full resolution always. This is a useful feature for graceful 
degradation of the quality of decompressed video when some of the compressed bitstream data are 
corrupted by noise. 

85 The present invention is directed to extending the MPEG-1 decoding methods in order to support 
these two forms of scalability, by providing a method and apparatus which supports resolutfon and 
bitstream scaling by hierarchical coding of the 8x8 DCT components. The method is flexible- one or more of 
these resolution scales can be stacked up to support many applications. The exact number of hierarchical 
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layers is left to the designer of the encoder or the requiren[)ents of the application. In addition, this invention 
addresses the problem of extending the syntax and methods of MPEG-1 such that the attributes of a 
macroblock are preserved across the various hierarchical layers. 

s PRIOR ART 

There are innumerable articles in the technical literature on the subject of hierarchical coding tech- 
niques, which are relevant to the subject matter of the present invention. Many of these references deal with 
the subjects of subband coding and pyramid coding. Two recent bool<s that review these subjects are- 
10 • "Subband Image Coding." J. W. WOODS, editor. Kluwer Academic Publishers. 1 991 . 

• "Digital Image Compression Techniques." M. RABBANI and P. JONES. SPIE Optical Enaineerina 
Bellingham. WA, USA. 1991. ^ a a. 

The following three documents are believed to be the most relevant prior art to this invention: 

• "Attemative to the hierarchical scheme." Ch. GUILLEMOT. T. N'GUYEN, and A LEGER 
»s ISO/JTC1/SC2WG8. JPEG N-260, February 1989. 

• "Setup of CCIR 601 multi-purpose coding scheme." PTT RESEARCH, the Netherlands. ISO/IEC 
JTC1/SC2AVG1 1 MPEG 91/051 . May 1 991 . 

• "Compatible Coding of CCIR 601 images: Predict the prediction error," PIT RESEARCH the 
Netherlands. ISO/IEC JTC1/SC2WG11 MPEG 91/114. August 1991. 

The first document describes a hierarchial scheme for compressing multiresolution still images, in which 
the OCT coefficients of the lower resolution images are used to predict the higher resolution OCT 
coefficients. This scheme differs from this invention in that DCTs are ahirays of the same size. In addition, 
there is no consideration of applying the scheme to compressing video segmented into blocks of pixels with 
common attributes. The second, and particularly the third, document describe a scheme that resembles the 
present invention when applied to two layers of resolution. The purpose of the scheme in these teachings is 
to utilize two levels of resolution scale such that pictures compatible with the CCIR 601 format as well as 
the MPEG-1 SIF format are generated. There is no attempt to generalize further the technique for a coder 
that IS scalable in resolution and bitstream. Furthermore, the attributes of a Macroblock are not preserved 
across the two scale levels, specifically motion compensation vectors at the higher resolution scale are 
30 specified at the 16x8 block level rather than the 32x16 CCIR 601 Macroblock level. This means that this 
attnbute cannot be shared by the resolutfon scales. Also, no details are provided on how other attributes are 
handled. 

OBJECTS 

35 

In contrast to the foregoing prior art systems and algorithms, it is an object of the present invention to 
provide a flexible syntax and encoding/decoding architecture for compressing video sequences, which 
pennit the decoding of video at a multiplicity of spatial resolution and bitstream scales. 

It Is another object of the present invention to provide a system and algorithms for supporting a 
40 multipliaty of scales in a way that extends the existing syntax and methods of the MPEG - 1 standard. Such 
exlensfons are performed with a minimum of additional overhead infonnation. 

It Is a further object of the present invention to provkJe encoder and decoder implementations in 
keeping with the system and algorithms of the invention; 

■♦5 SUMMARY OF THE INVENTION 
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The present invention involves a system and methods for processing a stream of video image data in 
such manner as to create a representation for video data that multiplexes data corresponding to resolution 
or bitstream scales. This representation is such that the identity of the basic MacroBtock (MB) structure of 

so MPEG-1 is preserved across all resolution and bitstream scales. Figure 6 shows how the MB identity is 
preserved by scaling across four levels of resolution. It is important to preserve this identity because an MB 
IS associated with a series of attributes which contribute to the amount of overhead data incorporated in an 
MPEG-1 compressed data stream. By preserving the MB identity across multiple resolutkjns and bitstream 
scales, these scales can share this overtiead. thus requiring it to be included only once in the data stream. 

55 Preserving the MB identity also simplifies significantly the derivation of motion estimation vector data for 
all resolution scales other than the highest resolution. Essentially the motion vector data corresponding to 
any resolution scale can be derived from the highest resolution motion vector data by appropriately scaling 
It down. For example, the x- and y-mofion- vector components at 1/4 resolution are 1/2 the correspond - 
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ing full resolution components. Alternatively, the full resolution motion vectore can be derived by appropriate 
scale up of lower resolution motion vectors. In tfie latter case, an additional correction may be added at the 
higher resolution scale to improve the precision of the motion vector data. 

A second aspect of this invention is that the methodology for coding an MB Is also preserved As noted 

5 above, in MPEG-1 an MB is subdivided into six 8x8 blocks of luminance and chrominance information 
each block being coded using (the 8x8 Discrete Cosine Transform (OCT). In the present technique, each 
scaled MB is also divided into six sub- blocks of luminance and chrominance information, each sub- block 
being coded using a OCT .of appropriate size. Thus for the 1/4 resolution MB in Rg. 6. a OCT of size 4x4 
would be used. Incidentally, it should be noted that such 4x4 DCT, for example, can be derived from the 

70 corresponding 8x8 DCT coefficients in a variety of ways, such that no explicit implementation of a 4x4 DCT 
is required. 

Rnally, a third aspect of this invention is related to the methodology for generating the DCT btock data 
needed for reconstructing a picture at a target resolution scale. In the inventive technique, the DCT 
coefficient data at any resolution or bitstream scale are coiled by standard differentia) coding techniques 
16 usmg the DCT data of a hierarchically lower scale as predictors. 

BRIEF DESCRIPTION OF THE DRAWINGS 
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Figures 1 - 4 illustrate layers of compressed data within the video compression layer of the MPEG - 1 
20 data stream, i.e.. Figure l depicts the Groups of Frames or Pictures (GOP's). Figure 2 depicts the 
Macroblock (MB) subdiviston of a picture. Rgure 3. depicts an exemplary Slice subdivision of a frame and 
Rgure 4 depicts the Block subdivision of a Macroblock. 

Figure 5 illustrates the two -level motion compensation among pfctures in a GOP employed in the 
MPEG-1 standard. . 

Figure 6 illustrates low macroblocks are scaled at various resolution scales according to the present 
invention. Note that macroblocks can also be scaled to resolutions higher than the full resolutfon shown in 
the Figure. 

^ Rgure 7 illustrates the hierarchical prediction of DCT coefficients corresponding to the scales shown in 
Rgure 6. 

Figure 8 is a block diagram of a decoder that can decode three of the four resolution scales of Rgures 
6 and 7. Note that several of the blocks could be discarded if only one resolution scale is desired as an 
output 

Figure 9 is a block diagram of a decoder with bit - stream scalability. 
Rgure 10 is a block diagram of an hierarchical decoder with dequantization prior to prediction 
Figure 11 is a bkxk diagram of a flexible scalable video compression encoder implementation that can 
be used wHh the present invention. 

Figure 12a is a block diagram of one version of Transform Unft that can be used in the flexible scalable 
video compression encoder implementation of Rgure 11. 

Figure I2b is a block diagram of another version of Transform Unit that can be used In the flexible 
40 scalable video compression encoder implementation of Rgure 11. 

Rgure 13a is a bkx* diagram of one version of a Hierarchfcal Prediction Unit that can be used in the 
flexible scalable video compression encoder implementation of Rgure 11. 

Figure 12b Is a block diagram of another version of a Hierarchical Prediction Unit that can be used in 
the flexible scalable video compressfon encoder implementation of Rgure 11. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

Before presenting a descripOon of the particular implementations of the present invention, it should be 
explained that the wide range of appHcations In which digital video is expected to play a role, imposes many 
conflicting requirements for video compression algorithms. These conflicts are manifested in terms of 
standards compatibility, encoder and decoder implementation complexity, functionality, image quality etc It 
IS believed that these conflicting requirements cannot be satisfied by a single coding algorithm, but instead 
require a flexible architecture of algorithms which can be matched to the requirements of the specific 
application. Such flexibility may be used to satisfy many of these conflicting requirements, while still 
preserving a great deal of compatibility amongst the different manifestations of the architecture It is 
believed, for example, that while different encoder implementations may be needed for different applica- 
tions, it should be possible to implement a single decoding device which can decode all bitstreams that 
conform to the flexible architecture. It is also believed that such a generic decoder should not be 
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excessively complex. One object of this Invention Is to provide such a generic decoder. 

Several applications will benefit from resolution and bitstream scalability features as described above. 
The methods and apparatus of the invention support resolution and bitstream scaling by hierarchical coding 
of 8x8 DOT components. Although DCTs of other sizes could be used, the preferred embodiment starts with 
8x8 DCTs because that is the choice of several standard compression algorithms. 

For resolution scaling, the invention may be used to provide, for example, up to four levels of resolution. 
The lowest possible resolution is attained by coding the equivalent of the upper left component of an 8x8 
OCT block; this resolution is 1/64 of the original resolution. Next a resolution of 1/16 of the original 
resolution may be obtained by coding of the equivalent of the upper left 2x2 coefficients of the DCT block 
Coding of the equivalent of the upper left 4x4 coeffteients leads to a resolution of 1/4 of the original 
resolution. Rnally. the coding of all 8x8 coefficients leads to the full resolution video. 

Given a fixed resolution, the invention supports bitstream scaling by coding multiple hierarchical layers 
at the same resolution. But, with progressively finer quantization factors, these layers wouW result In video 
of the same spatial resolution and of increasing quality. 

In this case, the first and lowest layer of the hierarchy is coded with coarse quantization and the higher 
layers are coded with quantization of increased precision. 

The architecture is flexible: one or more of these hierarchical layers can be stacked up in order of 
increasing resolution or precision, such that the reconstructed coefficients at one level of resolution are used 
to predict the corresponding coefficients at the next level of resolution. Except for the lowest hierarchical 
layer, transform coefficients in any layer are coded differentially with respect to their prediction. An encoder 
might be chosen, for example, to generate a bitstream that only contains data for the full and the 1/16 
resolution scales. In such case, the coefficients at the 1/16 resolution layer would be used to predict the 
coefficients at the full resolution layer. Arternatively. for bitstream scalability, a coarsely quantized set of 8x8 
DCT coefficients in one layer can serve as prediction for the con^esponding set of 8x8 coefficients in the 
next layer, which are more finely quantized. 

The basic characteristic of the architecture of the invention is that the identity of the MacroBlock (MB) 
structure of MPEG - 1 is preserved across all resolution and precision layers. Rgure 6 shows this feature for 
the case of coding with four levels of resolution. It is important to preserve this Identity because, as noted in 
the description of the MPEG-1 algorithm, an MB is associated with a series of attributes which contribute 
to the amount of overhead data incorporated In a compressed data stream. Preserving the MB Identity 
permits the reuse of this overhead data for all hierarchical layers. For example, the motion vector data 
corresponding to any resolution scale can be derived from the highest resolution motion vector data by 
appropriate scaling. 

Multiplexing of scaling layers 

Although not the subject of this invention, it is important to note that, before transmission or storage the 
data for the various scales must be multiplexed by an encoding device. There are many options for 
implementing this multiplexing. For example, the data for a full picture at each level of hierarchy could be 
catenated In the order of Increased resolution and precision. It will be understood that the data contains 
signals representative of information related to the picture elements or pixels of the video Images being 
handled. These signals are typically electrical signals that are processed in appropriate electronic devices, 
such as video cameras, computers, and ancilliary equipment as will be thoroughly familiar to and 
understood by those of skill in the art 

Quantization of resolution and bitstream scales 

In the preferred embodiment, the quantizatfon of each hierarchical layer uses the same matrices 
specified in MPE6-1. Thus, the DCT data for the various resolution scales is derived from the full 
resolution 8x8 DCT matrix. If the MPEG-1 quantization matrix of weights is denoted by Q8, the matrix of 
weights for lower resolution DCTs are derived as follows: 



Quantizer 


DCT 


Factor 


Q1 


1x1 


1/8 08 


Q2 


2x2 


1/4 Q8 


Q4 . 


4x4 


1/2 Q8 
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It should be noted that the present Invention also covers the case where generic quantization matrices 
are used at each scale, but this only makes the decoding device more complex. 

Hierachical prediction of resolution and bitstream scales 

In the preferred embodiment, DCT coefficients in a hierarchical layer are used to predict the cor- 
responding coefficients in the next layer of the hierarchy. An example is illustrated in Rgure 7 wherein a 
hierarchy of 4 resolution layers Is shown. The prediction algorithm is a simple one-to-one mapping of the 
properly scaled coefficients. However, it should be noted that other prediction algorithms could be used 
again at the cost of increased complexity. 

Provisions for rate control of hierarchical layers 

In the preferred embodiment the quantization parameter of MPEG - 1. QP. is used at the lowest layer of 
the hierarchy. QP parameters at other layers of the hierarchy are specified with reference to this lower layer 
QP. For example, a high layer QP parameter could be specified to be twice the lower layer QP. 

Scalable decoder implementations 

The invention involves an architecture with a flexible number of hierarchical layers. However, for ease of 
explanation, two three -layer decoders that exemplify the resolution and bitstream scalability features will 
be described. The decoder apparatus, which is shown in Figure 8, supports 2x2 (low), 4x4 (medium), and 
8x8 (high) resolution scales. Decoders with only one target resolution scale can be implemented by 
eliminating the boxes in Figure 8 that are not necessary to achieve the target After entropy decoding and 
demultiplexing the compressed data for the three resolution scales, there will be for every 8x8 block data, 
corresponding 2x2 band 4x4 block data, all of which are necessary to build the final 8x8 matrix of DCT 
coefficients. 

In the prefen^ed embodiment, the following steps are followed to arrive at the full resolution 8x8 DCT 
coefficients. After dequantizing by the qp2 quantization parameter of the 2x2 layer, the low resolution 2x2 
blocks are used as a prediction to the four lowest order coefficients of the corresponding 4x4 blocks. These 
predictions are summed to the dequantized 4x4 coefficients, where dequantlzation of the 4x4 coefficients Is 
. performed by the qp^ quantization parameter. The results of the previous sum are similarly used as 
prediction to the 16 lowest order coefficients of the corresponding 8x8 btocks. These predictions are 
summed to the dequantized 8x8 coefficients, where dequantlzation of the 8x8 coefficients is performed by 
the qp8 quantization parameter. 

Note that the OCT coefficients are only dequantized by quantization parameters as the final matrix of 
coefficients is rebuilt. Dequantlzation by the quantization matrix of weights is only needed once the final 
resolution is reached. This feature is possible because, in the preferred embodiment, the matrix of weights 
at the various scales are proportionally related as explained above. 

The final 8x8 matrix of coefficients can now be used to reconstruct the full .resolution picture using 
MPE6-1 techniques, including motion compensated prediction. Referring to Figure 8 in this regard, it will 
be appreciated tf>at at the full resolution level the 16x16 MCP unit represents a generic MPEG - 1 Motion 
Compensation Prediction unit operating on an MB; the IDCT box is a unit for performing a standard 
MPEG-1 8x8 inverse transformation; and tfie 08"^ box represents a unit for performing Inverse quantiza- 
tion by the corresponding MPEG-1 matrix of weights. 

To decode video pictures at other resolution scales a similar procedure is followed, except that the 
process of summing predictions to dequantized coefficient data slops at the resolution at which it is desired 
to decode. The operation of dequantlzation by a matrix of weights, represented by 04-\ and 02-^ uses 
tile scaled matrices of the pretended embodiment. The operation of inverse transformation is performed by 
using a transform of tiie appropriate size. Thus for decoding at 1/4 resolution a 4x4 inverse DCT should be 
used. The one -dimensional DCT matrices appropriate for decoding at the three "scaled* resolutions 
supported are: 
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Both the DCT(lxl) and the DCT{2x2) are trivial and should be easy to implement, even in software 

It should be noted that MCP units at other resolutions share the same motion vector data MV to 
generate motion compensated predictions for the scaled MBs at the various resolution scales Of special 
note IS that when using motion compensation techniques, the full resolution motion vector should be scaled 
appropriately to mateh the decoder resolution. Thus has been described previously. 

A decoder that implements bitstream scalability is shown in Rgure 9. The operation of thus decoder is 
very similar to that of Figure 8. except that only 8x8 operations are used to produce output pictures of 
increasing levels of quality. For this reason the various units (e.g.. 08"'. IDCTs. and MCP etc) can be 
physically implemented by single hardware and shared by the various hierarchical layers. 

Rnally, an alternative decoder implementation that is not part of the preferred embodiment, but in 
keeping with the invention, is shown in Rgure 10. This decoder operates in the same manner as that in 
Figure 8. However, it permits any quantization weight matrices to be used in the various hierarchical layers 
It also implements generic means of predicting OCT coefficients from a lower hierarchical layer In this 
sense P2 Is used to predict 4x4 coefficients from 2x2 coefficients, and P4 is used to predict 8x8 coefficients 
from 4x4 coeffiaents. There are many prediction algorithms that may be used, including the one-to-one 
mapping of the preferred embodiment. 

Encoder implementations 



35 



There are many possible implementations of encoders compatible with the decoders of the present 
invention described above, however, by way of example, two such encoders, each of which is only 
designed for resolution scaling, will be described. The general structure of a three resolution layer encoder 
IS shown in Figure 11. The encoder is divided into three parts. The first part is a transform unit which takes 

40 the digital video input and outputs OCT data lor the three resolution layers: d(8x8). d(4x4). and d(2x2) The. 
. second part is an hierarchical prediction unit which takes the tranfbrm unit's DCT output, and outputs 
quantized differential DCT data at all resolution layers: q{8x8). q(4x4). and q(2x2); This output is multiplexed 
and entropy coded in the third unit to genererate the final compressed video. The prediction unH also 
generates reconstructed DCT data: b(8x8), b(4x4). and b(2x2). which is fed back to the transform unit to 

■« , complete the loop of typical hybrid transform codecs. 

Figure 12a shows a simple implementation of the transfomo unit In this implementation the 8x8 layer 
contains elements which are also part of MPEG-1 encoders and more generically. part of moKon 
compensated hybrid transform encoders. The upper branch contains: a summer (E) to generate a motion 
compensated prediction difference: a toward 8x8 DCT transform to generate the 8x8 transfomi coefficients 

50 of said prediction difference; and a unit for quantization by a matrix of weights (08). The output is a set of 
partially quantized DCT coefficients. d(8x8). The return or feedback branch receives a set of partially 
reconstructed DCT coefficients. b(8x8). and then processes them with the following units: an inverse 
quantizer for the matrix of weights (08"^ and inverse DCT transfom,er (IDCT 8x8) that reconstructs the 
prediction differences; a summer to add the reconstructed prediction differences to the motion com- 
TJt reconstructing the original picture data; and finally, a Motion Compensation 

,ISL ^ *° ^ P'®^''^"^" "^^ P'**"^®- MPEG-1 this MCP unit operates on 

16x16 MBs as shown in the Figure, but other block sizes could be used. In this implementation of the 
transfom. unit, the d(2x2). and d{4x4) coefficients are simply extracted from the corresponding 8x8 
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coefficients. It should be noted that it is also possible to derive the d(2x2) and d(4x4) coefficients through 
other reduction or weighting algorithms applied to the 8x8 coefficients. 

Note that because there is no feedback loop in the lower resolution scales, this encoder will result in 
aocumuteton of quantization and motion compensation errors at these resolution scales. The error, however 
will be naturally reset back to zero whenever a new Group of Pictures starts. While the quality of the lowe^ 
resolution layers will be limited by this accumulation of errors, the simplicity of the encoder makes this 
approach attractive. In particular, if what is required is only bitstream scalability, this approach is ail that is 
needed. 

Another implementation of the transfomi unit is shown in Figure 12b. This version is similar to the one 
m Rgure 12a. however. d(2x2) and d(4x4) are generated by similar, but completely independent loops 
operating at each resolution scale. In this sense, the and H^, units are used to filter and reduce the 
resolution of the input video by a factor of 1/4 in each case. In this manner each layer takes an input of the 
appropnate resolution. All operations such as DOT. quantization, and MCP are also scaled according to the 
resolution of the layer. ^ 

At a cost of increased complexity, this version would generally produce better quaTity low -resolution 
pictures than those produced by the unit in Rgure 12a. In this case, coding errors will not accumulate more 
than one picture period. Note, however, that the results of the 16x16 motion estimation can be shared by all 
resolution loops since, in the invention, motion vectors are one of the attributes shared by MBs at ali scales 
This implementation is more appropriate for applications where the quality of the low resolution video is 
important. 

Rgure 13a shows one implementation of the hierarchical prediction unit. Rrst the hierarchical prediction 
differences for the 4x4 and 8x8 layers are generated in the summers. All layers are then quantized by their 
respective quantizer parameters and the results are output as q(2x2). q(4x4). and q(8x8). These results are 
also inverse quantized by the corresponding quantizer parameters and then added in the two other 
summers to generate the partially reconstructed b(2x2). b(4x4). and b(8x8) data, which is fed back to the 
transform unit Figure 13b. shows a rearrangement of the same elements which can be used as an 
altemahve implementation for the hierarchical prediction unit. 

It should be noted that one skilled in the art can design further encoding alternatives that are compatible 
with the decoding methodology and architecture of the present invention. 

Claims 



5. 



A method for producing a compressed video data representation that is capable upon decompression 
of being displayed on a video screen at a muKipKcity of hierarchical scales of picture resolution and/or 
quality, comprising the steps of: 

providing video picture element data signals indicative of spatial block units, macroblocks said 
mawobtocks having associated therewith infomiation on compressed picture data and a set of (Coding 
attnbutes Compnsing coding decisions, motion compensation vectors, and quantization parameters- and 
producing for each of said macroblocks. a corresponding scaled macrobiock at every other scale of 
said multiplicity such that the same coding attributes are shared by said scaled macroblocks. 

A method as in claim 1 wherein said compressed picture data associated with a given macrobtock 
corresponds to quantized transform coefficients of the appropriate scale. 

A method as in claim 1 wherein said information on said set of attributes is such that said attributes are 
appropriately modifiable to conform to.the scale of said scaled macroblocks upon decompression of 
said associated information. r k 

A method as in claim 1 further comprising the step of selecting a decompression target scale and 
performing the operation of im^rse transformation only at the target scale, said inverse transfomiation 
operating on final transform coefficient data, said final transform coefficient data for a target scaled 
macrobiock being derived from transform coefficients of the corresponding scaled macroblocks in all 
tower scales and the transfomn coeffident data of the target scale. 

A method as in claim 1 wherein said producing step comprises intermixing information on a variable 
number of resolution scales with a variable number of bitstream scales. 
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A method as in claim 1 further comprising the step of omitting some of the scaled IVIacroblock (MB) 
data when found in error. 
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